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(57) A method and system for providing a group of 
users with a search facility for information stored at a 
plurality of addressable logical locations is described. A 
database of index information where information is 
stored at a plurality of logical locations is provided. The 
index information includes the address of the logical lo- 
cations and the descriptive information for information 
stored at each logical location. The descriptive informa- 
tion matches a common profile of interest to the group 
of users. Accessing and retrieval of stored information 
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by a user in the group is monitored and descriptive in- 
formation is derived using the retrieved information. The 
relevance of the retrieved information is determined by 
comparing the descriptive information to the profile. If 
the retrieved information is determined to be relevant, 
the database is updated using the address and the de- 
scriptive information of the determined relevant re- 
trieved information. 



A method and system for updating a searchable database of descriptive information 
describing information stored at a plurality of addressable logical locations 
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Description 

[0001] The present invention generally relates to a 
method and system for providing a group of users with 
a search facility for information stored at a plurality of 
addressable logical locations. In particular the present 
invention relates to a method and system for updating 
a database searchable by a group of users and contain- 
ing descriptive information and addresses for the stored 
information. 

[0002] The provision of a search capability for infor- 
mation stored at a plurality of addressable locations is 
a problem when the amount of information becomes 
large and distributed. It is known in the art to provide a 
database of index information which is searchable to en- 
able the address of the stored information to be located 
based on descriptive information stored with the ad- 
dress in the database. 

[0003] With the prevalent use of the Internet, and in 
particular the World Wide Web, the problem for search- 
ing for and retrieving information in the form of web pag- 
es is a problem that has received much attention in the 
art. Many search engines have been developed which 
search and catalogue web pages to form a database of 
addresses and descriptive information for those ad- 
dresses. A user is thus able to submit a query to the 
search engine to search the database to retrieve web 
pages best matching the query. 

[0004] The problem with many prior art search en- 
gines is that they try to cover the whole of the World 
Wide Web. This is an almost impossible task in view of 
the fluid nature of the Internet. Also, many of the results 
of the search will not be relevant to the user's interests. 
Further, the requirement for cataloguing the whole of the 
Internet provides a vast burden on the processing power 
required. 

[0005] One prior art system disclosed in US 5931907 
comprises the local storage of information as a distrib- 
uted database by a community of agents. When a page 
is loaded and considered to be of interest to a user, the 
agent can be instructed to catalogue the page and the 
user can add additional user information. Other users of 
agents within the community can be notified of the po- 
tentially interesting information. In this way a community 
of users have access to potentially interesting informa- 
tion distributed across the network. 
[0006] One disadvantage of this arrangement is that 
the information is not held centrally at the database and 
requires each of the agents to communicate with each 
of the other agents within the network. Further, the cat- 
aloguing of web pages is only instigated manually after 
a user has inspected the page. 

[0007] It is an object of the present invention to pro- 
vide an improved search facility for information such as 
web pages to a group of users with common interests. 
[0008] In accordance with one aspect, the present in- 
vention provides a method and system for providing a 
group of users having a common interests with a search 
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facility for information stored in a plurality of addressable 
logical locations. A database of index information for in- 
formation that is stored at the plurality of logical loca- 
tions is provided in which the index information includes 

5 the address of the logical locations and descriptive in- 
formation for information stored at each logical location. 
The descriptive information matches a common profile 
of interest of the group of users. The accessing and re- 
trieval of stored information by a user in the group is 

10 monitored and descriptive information is derived using 
the retrieved information. The relevance of the retrieved 
information is determined by comparing the descriptive 
information to the profile and if any relevant retrieved 
information is determined, the database is updated us- 

15 jng the address and descriptive information of the de- 
termined relevant retrieved information. 
[0009] The present information can be implemented 
in a single apparatus such as a suitably programmed 
general purpose computer or dedicated hardware. How- 

20 ever the present invention is more preferably applicable 
to a network wherein the database is provided at a serv- 
er and within a client the accessing and retrieval of 
stored information, the monitoring of the accessing and 
retrieval, and the deriving of the descriptive information 

25 takes place. The address and the derived descriptive 
information is then sent to the server for updating of the 
database at the server. 

[0010] The determining of the relevance of the re- 
trieved information can take place in the client or in the 
30 server. Preferably the determination takes place in the 
client in order to reduce the amount of information trans- 
mitted to the server and to distribute the processing 
load. 

[001 1] In one embodiment an initial request from a cli- 
35 ent to access the database at the server is sent and an 
agent is downloaded from the server to the client in re- 
sponse. The agent comprises an autonomous applica- 
tion which when installed and running on the client per- 
forms the monitoring, determining and sending process- 
<o es. The agent thus uses the profile to identify relevant 
information to be used to update the database. The ap- 
plication can be implemented in a multi tasking environ- 
ment in the background. 

[0012] In a preferred embodiment, the user of the cli- 
45 ent is warned that in order to use the search facility i.e. 
to be able to access the database, the agent must be 
downloaded. Access is denied to the database without 
the agent being installed on the client. Only when the 
user inputs a confirmation which is sent by the client to 
so the server is the agent downloaded to the client from the 
server. 

[0013] Thus the trade off by a user for access to the 
search facility is that their computer is used to use their 
activities to contribute towards updating the database. 
55 The user is a member of a group of users who have a 
common interest and thus the agent has a profile rep- 
resentative of the common interest of the group. Thus 
for the user to access the database, they allow the dis- 
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tributive processing of information locations visited in or- 
der to update the database for the common good of the 
group. 

[0014] The present invention is suited to any system 
in which information is stored at addressable logical to- 5 
cations. The present invention is however particularly 
suited to the Internet in which the Internet Protocol is 
used and the stored information includes hypertext 
mark-up language. The logical addresses thus com- 
prise Uniform Resource Locators (URLs). In this em- 10 
bodiment the client will implement a web browser to ac- 
cess web pages hosted by web servers and the agent 
on the client will monitor the accessing and retrieval of 
web pages. 

[0015] In addition to the monitoring of the pages ac- 15 
tually visited by the client, the agent can also include a 
"spidering'' capability. Links from the web page access 
and retrieved can be "spidered" or crawled by the agent 
in order to access and retrieve the web pages and de- 
termine descriptive information for further expanding the 20 
updating of the database. The web pages which are spi- 
dered or crawled are processed to determine descrip- 
tive information and the descriptive information is then 
analysed to determine the relevance of the page. In this 
way index information only for relevant pages is sent to 25 
the server. 

[0016] In one embodiment, the database is periodi- 
cally checked to see if there are any entries in the data- 
base which have not been recently updated. If there are 
any entries which have not recently been updated, the 30 
web page can be accessed and retrieved by a spidering 
function at the server. Descriptive information for the 
page can then be determined and compared with the 
profile to determine if the page is relevant still. If not the 
page is deleted from the database. If the page is relevant 35 
the entry in the database is updated with the new de- 
scriptive information and a date to show when it was up- 
dated. 

[0017] The profile can comprise any information suit- 
able for defining the common interests of the groups of *o 
users. When the stored information includes text such 
as web pages, the profile comprises descriptive infor- 
mation which comprises text. The determination of rel- 
evance can then be performed on a keyword basis by 
matching the keywords of the profile to keywords in the 45 
descriptive information. The keyword matching will not 
be exact and can be based on lexical matching of syn- 
onyms. As an alternative matching technique, natural 
language matching of the text of the profile and the text 
of the descriptive information can be used. so 
[001 8] The present invention can be implemented on 
a single apparatus or on a client apparatus and a server 
apparatus each comprising suitably programmed gen- 
eral purpose computer. Thus the present invention can 
be embodied as computer programme code for control- 55 
ling a general purpose computer. The computer pro- 
gramme code can be provided to a general purpose 
computer on any suitable carrier medium such as a stor- 



age medium (e.g. floppy disk drive, CD ROM, magnetic 
tape or programmable memory device) or a signal (such 
as an electrical signal carried over a network such as 
the Internet). 

[0019] An embodiment of the present invention will 
now be described with reference to the accompanying 
drawings, in which: 

Figure 1 is a schematic diagram of a system in ac- 
cordance with an embodiment of the present inven- 
tion, 

Figure 2 is a flow diagram illustrating the process of 
downloading the agent from the server to the client 
in the embodiment of the present invention, 
Figure 3 is a flow diagram illustrating the process of 
determining and sending descriptive information 
from the client to the server in the embodiment of 
the present invention, 

Figure 4 is a flow diagram illustrating the process in 
the server for updating the database using the re- 
ceived information from the agent in the embodi- 
ment of the present invention, and 
Figure 5 is a flow diagram illustrating the process of 
periodically updating the database in accordance 
with the embodiment of the present invention. 

[0020] Figure 1 schematically illustrates a system in 
accordance with an embodiment of the present inven- 
tion for implementation as a search facility for web pag- 
es over the Internet 50. 

[0021] Clients 60 and 70 are connected to the Internet 
50 in order to access web pages at web servers 30 and 
40. The clients 60 and 70 have respective web browsers 
61 and 71 implemented therein for accessing and re- 
trieving web pages from the web servers 30 and 40. The 
clients 60 and 70 also have respective agents 62 and 
72 loaded therein which have been downloaded to them 
in order to monitor the activity of the respective web 
browsers 61 and 71. The agents are autonomous appli- 
cations which run in the background in a multi-tasking 
environment on the clients 60 and 70 such as In the Win- 
dows (registered trade mark) operating system. The 
agents 62 and 72 are able to communicate on the Inter- 
net 50 to a search server 10 in order to communicate 
the results of their monitoring activities. 
[0022] The search server 10 is connected to the In- 
ternet 50 to enable the clients 60 and 70 to access a 
search engine 3 via a web server 1 acting as the inter- 
face to the client's 60 and 70 using the web browsers 
61 and 71. The search engine 3 interfaces to a database 
20 providing a database of index information comprising 
logical addresses and descriptive information. In this 
embodiment the logical addresses comprise the URLs 
of web pages and the descriptive information comprises 
key words taken from the text of the web page (which 
can include the metatags). Thus a client 60 or 70 ac- 
cesses the search server 10 using respective web 
browser 61 or 71 to access the web server 1 which acts 
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as the interface and communicates with the search en- 
gine 3 to search the database 20. Thus the web server 
1 and search engine 3 can provide a conventional 
search facility for searching database 20. However, the 
interface to the database 20 differs from conventional 
search engine interfaces in that when an initial request 
from a web browser 61 or 71 is received an agent down- 
load application 2 will detect whether the request comes 
from a client 60 or 70 which has an agent 62 or 72 loaded 
thereon. If not the agent download application 2 will 
cause the web server 1 to warn the user of the web 
browser 61 or 71 that an agent must be downloaded in 
order to access the database 20 using the search en- 
gine 3. If the user inputs an acceptance, this is received 
from the web browser 61 or 71 by the web server 1 and 
passed to the agent download application 2 which 
downloads the agent 62 or 72 to the client 60 or 70. Thus 
when the web browser 61 or 71 next requests access 
to the search engine 3, access is permitted. 
[0023] The search server 10 is also provided with a 
spider application 3 for carrying out the conventional spi- 
dering operation in order to periodically update the da- 
tabase 20. 

[0024] The method of operation of the present inven- 
tion will now be described in more detail with reference 
to the flow diagrams of Figures 2 to 5. 
[0025] Figure 2 is a flow diagram illustrating the proc- 
ess of downloading the agent to the client in order to 
allow access to the search engine. When the client ini- 
tially attempts to connect to the web server, in step S1 
the browser is opened and used to access and retrieve 
a web page at the search server (step S2). The request 
to retrieve the web page from the browser will also in- 
clude a request to access the search engine. In step S4 
the search server detects whether there is an agent at 
the client. If an agent is present, in step S4 the client is 
allowed access to the search engine in order to search 
for web pages using keywords etc. in a conventional 
manner. 

[0026] If the agent is not detected at the client (step 
S4), in step S6 a message is sent to the client and dis- 
played to inform the user of the client that the agent must 
be downloaded in order to use the search engine. This 
message can be in the form of a web page with a check 
box for example to enable the user to accept the down- 
loading of the agent in return for access to the search 
engine. In step S7 a user acceptance is then awaited, 
if no user acceptance is input, in step S8 the user is 
refused access to the search engine. For example, if a 
user selects to decline downloading of the agent, a web 
page can be set to the web browser to inform the user 
that access to the search engine is refused. 
[0027] Once the search server receives the accept- 
ance from the agent, in step S9 the agent download ap- 
plication 2 downloads the agent to the client. The agent 
comprises an autonomous application capable of run- 
ning in the background. The agent will include in the 
code or as metadata a profile defining the common in- 



terests of the group of users. The profile can comprise 
a set of keywords. Once the agent has been download- 
ed in step S9, in step S10 the agent is installed from the 
client as is conventional in the Windows (trade mark) 

5 operating system, in step S11 when the client is restart- 
ed, the agent runs automatically in the background. 
From then on the client is allowed access to the search 
engine (step S5). The installation of the agent on the 
client causes an icon to be added to the task bar in the 

10 Windows (trade mark) operating system display. Thus 
the next time a user wishes to access the search engine, 
they can either use the web browser (step S1) or they 
can click on the agent icon in the task bar (step S3). If 
the agent icon is clicked on in the task bar, the web 

is browser is launched and directed to access the search 
server. Alternatively the agent can include a web brows- 
er interface to act as the search interface for the client 
to access the search server to perform a search through 
the database 20. 

20 [0028] The operation of the agent on the client will 
now be described in more detail with reference to the 
flow diagram of Figure 3. 

[0029] In step S20 the client loads a web page from 
a web server 30 or 40. The agent picks up the URL and 

25 determines a catalogue for the URL (step S1 ). The cat- 
alogue can comprise any descriptive information. In this 
embodiment the process comprises the extraction of 
keywords from the hypertext mark up language (HTML). 
Methods for determining a catalogue for a web page are 

30 well known in the art and it will be apparent to a skilled 
person in the art that any known technique can be used 
for determining the catalogue. 

[0030] In step S22 the agent checks the catalogue for 
the relevance of the page against a profile comprising 

35 key words which represent the interest of the group of 
users. Thus if in step S23 it is determined that the page 
is not relevant since the keywords for the page do not 
significantly match the key words for the profile, the 
page is ignored in step S24. If it is determined that the 

40 keywords (or a significant number of them) match, in 
step S25 the agent uploads the URL and the catalogue 
to the search server. Once the URL and catalogue have 
been uploaded to the search server if the page is rele- 
vant, in step S26 the agent determines whether the links 

45 on the page are to be catalogued. This can either be a 
preset parameter for the agent or the agent can deter- 
mine this based upon the bandwidth (i.e. modem speed 
or LAN connection - or even mobile link speed) and 
processing power of the client. Also the server response 

50 time can be taken into account. If the links are not to be 
catalogued the process terminates in step S27. If the 
links are to be catalogued in step S28 the agent will de- 
termine the level of links to be catalogued. Once again 
the level of the links can be a predetermined number of 

55 links, or it can be based upon the processing power or 
bandwidth available to the client. This avoids too large 
a proportion of the processing power or communication 
bandwidth being taken up by the cataloguing process 
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(a spidering process) and avoids a significant down- 
grading of the performance of the users machine due to 
the "spidering" process. In step S29 it is determined 
whether the current cataloguing level has been reached 
and if so the process terminates in step S27. If not in 
step S30 the agent searches for linked web pages which 
have not yet been catalogues. If there aren't any in step 
S31 the process is terminated at step S27. if there are 
still linked web pages to be catalogued, in step S32 the 
agent sends a request and receives a linked page. The 
agent then determines a catalogue for the URL in step 
S33 and the process returns to step S22 for the deter- 
mination of the relevance of the page against the key- 
words. 

[0031] It can thus be seen that the process of Figure 
2 will continue until all of the pages to a predefined level 
have been catalogued and, where relevant the URLs 
and catalogues for the pages have been uploaded to 
the search server. 

[0032] Thus in this embodiment of the present inven- 
tion not only is the page which has been visited by the 
web browser catalogued and used to update the data- 
base, also linked pages can be used to update the da- 
tabase. Thus the activity of the client machine is auto- 
matically monitored and when any relevant pages are 
detected these are used to update the central database 
for the good of the group of users. This ensures that 
when many clients are operating, the database is up- 
dated within the focus defined by the profile used by 
each of the agents. The profile in this embodiment com- 
prises a carefully chosen selection of keywords. 
[0033] The operation of the server upon receipt of the 
URL and catalogue from the agent will now be described 
in more detail with reference to the flow diagram of Fig- 
ure 4. 

[0034] In step S40 a URL and catalogue is received 
from the agent. It is then determined whether the URL 
is already in the database (step S41) and if so in step 
S42 it is determined whether the entry in the database 
has been updated recently or not. If it has been updated 
recently and the entry is not old (step S43) the URL and 
catalogue is ignored and the process terminates in step 
S44. If the entry in the database for the URL is older 
than the predetermined age, in step S45 the received 
URL and catalogue is used to update the URL and cat- 
alogue in the database and the process proceeds to 
step S47. 

[0035] If in step S41 it is determined that the URL is 
not in the database, in step S46 the URL and catalogue 
received from the agent is added to the database. Once 
the database has either been added to or updated (step 
S46 or step S45) in step S47 the database entry is 
marked with the date so that the age of the entries in the 
database can be monitored particularly with regard to 
step S42. 

[0036] In order to further expand the database, in step 
S48 the spider application within the search server re- 
quests and receives the page for the URL which has 



been added to or updated in the database. In step S49 
the spider application then searches for any linked web 
pages on the received page. If there are none (step 
S50), the process terminates in step S44. If there are 

5 linked web pages, in step S51 it is determined whether 
the URLs are in the database. If so in step S52 it is de- 
termined whether the entries in the database are older 
than a predetermined age and if not the process termi- 
nates in step S44. If the entries are old (step S52) or if 

io the URLs are not in the database, in step S53 the spider 
application requests and receives pages for the URLs. 
The spider application then determines catalogues for 
the pages in step S54 and in step S55 it is determined 
whether the pages are relevant or not by comparing the 

15 keywords in the catalogue to the keywords stored as the 
profile. If the pages are determined not to be relevant, 
in step S56 the pages are ignored and in step S44 the 
process terminates. 

[0037] If the pages are determined to be relevant 
20 (step S55) in step 57 the URLs and catalogues are add- 
ed to or updated in the database. The database entries 
are then marked with the date in step S58 and the proc- 
ess terminates in step S44. 

[0038] Thus in this process illustrated in Figure 4, the 
25 database is update using 

1. a catalogue for the URL visited by the user of a 
client 

2. catalogues for pages linked from the visited page 
30 as determined by the client 

3. catalogues for links from the visited web page as 
determined by the server. 

[0039] The benefit of also providing for a spidering ca- 

35 p ability at the server is that the client may be provided 
with a limit spidering capability e.g. the level of the links 
to be followed by the spider in the client can be limited. 
This limits the processing power and bandwidth taken 
up by the agent. The full spidering process can thus be 

40 completed or indeed fully carried out by the server. 
[0040] In addition to the spidering process carried out 
to supplement the catalogues received from the agents, 
the server can also periodically update the database. 
This process will be described in more detail with refer- 

45 ence to the flow diagram of Figure 5. 

[0041] In step S60 periodically the spider application 
will look at the URLs in the database and in step S61 a 
determination will be made as to whether any have not 
recently been updated. If all of the entries have recently 

so been updated, the process terminates in step S68. If 
there are entries in the database which have not been 
recently been updated (step S61) the spider application 
requests and receives web pages for the URLs (step 
S62). The spider application then determines cata- 

55 logues for the pages (step S63) and checks the rele- 
vance of the catalogue against the keywords (step S64). 
If the pages are not relevant (step S65) in step S69 the 
URLs are deleted from the database and the process 
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terminates in step S68. 

[0042] If the pages are determined to be relevant 
(step S65) in step S66, the URLs and the catalogues in 
the database are updated and in step S67 the database 
entries are marked with the date of update. The process 
then terminates in step S68. 

[0043] The process of Figure 5 thus comprises a con- 
ventional periodic spidering process in order to keep the 
database up to date. It enables the database to be 
pruned to remove pages which are not no longer rele- 
vant. 

[0044] Although the present invention has been de- 
scribed hereinabove with reference to a specific embod- 
iment, it will be apparent to a skilled person in the art 
that modifications lie within the spirit and scope of the 
present invention. 

[0045] For example although in the embodiment the 
spider application 3 is illustrated as residing in the 
search server 10, the spider application can in fact re- 
side on any physical server on the Internet 50. The spi- 
der application could then independently receive the 
URLs which are also sent to the search server for up- 
dating the database so that the spider can spider from 
these URLs. The resultant relevant links can then be 
submitted to the search engine much in the same way 
as relevant links are submitted by agents. 
[0046] Although in the embodiment the determination 
of the relevance of the page is implemented by the 
agent, alternatively, this function could be given to the 
search server. Thus in this case the agents 62 and 72 
will transmit catalogues and URLs for all pages visited 
by the web browser. Also catalogues and URLs for all 
links from the visited page can be sent to the search 
server. It can thus be left for the search server to deter- 
mine the relevance of the pages for the updating of the 
database. This process is however less preferred since 
it increases the amount of data that has be to transmitted 
by the agents to the search server. Although in the spe- 
cific embodiment the matching process between the 
profile and the descriptive information (the catalogue) 
was performed using keywords, the present invention 
can be applied to the use of any form of descriptive in- 
formation. The present invention is particularly suited to 
the use of text which can allow keyword matching either 
strictly or on the basis of synonyms or natural language 
matching of text. It is also possible to define a profile as 
comprising meta information such as the date of down- 
loading into the web page by the web browser or the 
address of the originating site. The profile can comprise 
any information which allows for the definition of the 
common interests of the group of users using the clients 
60 and 70. 

[0047] Although in the present invention the network 
on which the clients and the search server are connect- 
ed is described as comprising the Internet, the present 
invention is applicable to any network and can for ex- 
ample comprise an Intranet, Extranet, or local area net- 
work. The present invention is more widely applicable 



to any form of information retrieval such as document 
retrieval over a network wherein a central database of 
index information is stored to allow for searching for a 
stored information. 
5 [0048] The determination of the relevance of the 
stored information need not be just based on the profile. 
The relevance can also be determined based on wheth- 
er or not the database has recently been updated for 
that address. 

[0049] In addition to update the database using re- 
trieved information which matches the profile, a user can 
select to update the database using any retrieved infor- 
mation by manually selecting it. 
[0050] The present invention is just not limited to the 
use of the Internet using web addresses. The present 
invention is applicable to any logical addressing system 
and for example covers all protocols using URLs e.g. 
HTTP, FTP, POP, and SMTP. 

[0051] The present invention is ideally suited to the 
searching needs of a specific interest or community. The 
central database can self-focus, expand and update au- 
tomatically based on the behaviour of the members of 
the group. The common interests of the group can be 
defined by a suitable profile such as keywords and this 
keeps the domain of the search focused. However, the 
focusing of the search database does not prevent it be- 
ing amended and expanded when users view a site that 
is not currently indexed. So long as the site falls within 
the current field of interest as defined by the profile, the 
site will automatically be indexed by the agent and the 
database updated. 

[0052] The advantages of this arrangement are that 
the user community can focus on the development and 
usefulness of the search indexed over time. The users 
can update the search catalogue database automatical- 
ly themselves thus effectively distributing the process- 
ing task and requirement for bandwidth over many us- 
ers. 

[0053] In the embodiment, the database is described 
as being updated as soon as a URL is passed from an 
agent, however, it is possible for the updating process 
to be modified such that the database is only updated 
when the URL is submitted by agents a predetermined 
number of times. This would indicate that one or a 
number of users who visited the sight more than once, 
clearly indicating that the sight is relevant and should be 
added to the database. 



Claims 

1. A method of providing a group of users with a 
search facility for information stored in a plurality of 
addressable logical locations, the method compris- 
ing: 

providing a database of index information for 
information stored at a plurality of the logical lo- 



ts 



20 



25 



30 



35 



40 



45 



50 



6 



11 



EP 1 207 468 A2 



12 



cations, the index information including the ad- 
dress of the logical locations and descriptive in- 
formation for information stored at each logical 
location, the descriptive information matching 
a common profile of interest to the group of us- 
ers; 

monitoring accessing and retrieval of stored in- 
formation by a user in the group; 
deriving descriptive information using the re- 
trieved information; 

determining the relevance of the retrieved in- 
formation by comparing the descriptive infor- 
mation to the profile; and 
updating the database using the address and 
descriptive information of any determined rele- 
vant retrieved information. 

2. A method according to claim 1, wherein the data- 
base is provided in a server and the accessing and 
retrieval of stored information, the monitoring of ac- 
cessing and retrieval, and the deriving of the de- 
scriptive information take place in a client connect- 
ed to the server by a network, the method including 
sending the address and the derived descriptive in- 
formation to the server for the updating of the data- 
base. 

3. A method according to claim 2, including sending 
an initial request from the client to access the data- 
base at the server, and downloading an agent from 
the server to the client, wherein the agent performs 
the monitoring, determining and sending process- 
es. 

4. A method according to claim 3, wherein after having 
received the initial request, the server sends a 
warning to the client that the agent must be down- 
loaded before allowing access to the database and 
awaits a confirmatory input from the user before 
downloading the agent, and the sender only permits 
access to the database by the client if the agent is 
loaded thereon. 

5. A method according to any one of claims 2 to 4, 
wherein the determining of the relevance of the re- 
trieved information takes place in the client and the 
address and the descriptive information for only re- 
trieved information determined to be relevant is sent 
to the server. 

6. A method according to any preceding claim, where- 
in the stored information at least some of the ad- 
dresses has links to stored information at other ad- 
dresses, the method including: 

when the retrieved has links, accessing and re- 
trieving information stored at other addresses; 
deriving descriptive information using the re- 



trieved information; 

determining the relevance of the retrieved in- 
formation by comparing the descriptive infor- 
mation to the profile; and 
5 updating the database using the address and 

descriptive information of any determined rele- 
vant retrieved information. 

7. A method according to any preceding claim, includ- 
10 ing periodically checking the database to identify 

any index information which has not been updated 
recently, if any index information is identified, ac- 
cessing and retrieving information stored at any of 
the addresses in the identified index information, 

is deriving descriptive information using the retrieved 
information, determining the relevance of the re- 
trieved information by comparing the descriptive in- 
formation to the profile, and updating the database 
using the address and descriptive information of 

20 any determined relevant retrieved information. 

8. A method according to any preceding claim, where- 
in the stored information includes text, the profile 
comprises descriptive text, and the descriptive in- 

25 formation comprises text. 

9. A method according to claim 8, wherein the profile 
comprises keywords, the descriptive information 
comprises keywords, and the relevance of the re- 

30 trieved information is determined by matching the 
keywords of the profile to the descriptive informa- 
tion. 

10. A method according to claim 9, wherein the match- 
35 ing process includes lexical matching of synonyms. 

11. A method according to claim 8, wherein the rele- 
vance of the retrieved information is determined by 
a natural language matching of the text of the profile 

to with the text of the descriptive information. 

12. A method according to any preceding claim, for im- 
plementation over an Internet Protocol network, 
wherein the stored information includes hypertext 

45 mark-up language, the addresses comprise Uni- 
form Resource Locators, and the monitoring proc- 
ess monitors the accessing and retrieval of hyper- 
text mark-up language by a web browser applica- 
tion for the display of web pages. 

50 

13. A system for providing a group of users with a 
search facility for information stored on a plurality 
of addressable logical locations, the system com- 
prising: 

55 

a database of index information for information 
stored at a plurality of the logical locations, the 
index information including the addresses of 
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the logical locations and descriptive informa- 
tion for information stored at each logical loca- 
tion, the descriptive information matching a 
common profile of interest to the group of users; 
monitoring means for monitoring access and 
retrieval of stored information by a user in the 
group; 

deriving means for deriving descriptive infor- 
mation using the retrieved information; 
determining means for determining the rele- 
vance of the retrieved information by compar- 
ing the descriptive information to the profile; 
and 

updating means for updating the database us- 
ing the address and descriptive information of 
any determined relevant retrieved information. 

14. A system according to claim 13, comprising a server 
and a client connected by a network, the server in- 
cluding the database and the updating means, the 
client including the monitoring means, the deriving 
means and sending means for sending the address 
and the derived description information to the serv- 
er for the updating of the database. 

15. A system according to claim 14, wherein the client 
includes a search client for sending an initial re- 
quest to access the database, the server includes 
download means for downloading an agent to the 
client, and said agent comprises said monitoring 
means, said deriving means, and said sending 
means. 

16. A system according to claim 1 5, wherein said server 
includes download warning means for, after having 
received the initial request, sending a warning to the 
client that the agent must be downloaded before al- 
lowing access to the database and receiving a con- 
firmatory input from the user, and said download 
means is responsive to the confirmatory input to 
download the agent to the client. 

17. A system according to any one of claims 14 to 16, 
wherein the client includes the determining means, 
and the sending means is controlled to only send 
the descriptive information and the address for re- 
trieved information determined to be relevant. 

18. A system according to any one of claims 13 to 17, 
wherein the stored information at least some of the 
addresses has links to stored information at other 
addresses; the system including a search applica- 
tion for accessing and retrieving information stored 
at other linked addresses, deriving descriptive in- 
formation using the retrieved information; determin- 
ing the relevance of the retrieved information by 
comparing the descriptive information to the profile; 
and updating the database using the address and 



descriptive information of any determined relevant 
retrieved information. 

19. A system according to any one of claims 13 to 18, 
5 including updating means is adapted to periodically 

check the database to identify any index information 
which has not been updated recently, to, if any index 
information is identified, access and retrieve infor- 
mation stored at any of the addresses in the identi- 
10 fied index information, to derive descriptive informa- 
tion using the retrieved information, to determine 
the relevance of the retrieved information by com- 
paring the descriptive information to the profile, and 
to update the database using the address and de- 
is scriptive information of any determined relevant re- 
trieved information. 

20. A system according to any one of claims 13 to 19, 
wherein the stored information includes text, the 

20 profile comprises descriptive text, and the descrip- 
tive information comprises text. 

21. A system according to claim 20, wherein the profile 
comprises keywords, the descriptive information 

25 comprises keywords, and the determining means is 
adapted to determine the relevance by matching 
the keywords of the profile to the descriptive infor- 
mation. 

30 22. A system according to claim 21 , wherein the deter- 
mining means is adapted to perform a lexical 
matching of synonyms. 

23. A system according to claim 20, wherein said de- 
35 termining means is adapted to determine the rele- 
vance of the retrieved information by a natural lan- 
guage matching of the text of the profile with the text 
of the descriptive information. 

*o 24. A system according to claim 14, wherein the net- 
work is an Internet Protocol network, the stored in- 
formation includes hypertext mark-up language, the 
addresses comprise Uniform Resource Locators, 
and the monitoring means is adapted to monitor the 

<5 accessing and retrieval of hypertext mark-up lan- 
guage by a web browser application for the display 
of web pages. 

25. A server apparatus for providing a search service 
so to clients over a network to allow searching for in- 
formation stored in a plurality of addressable logical 
locations over the network, the server apparatus 
comprising: 

55 a database of index information for information 

stored at a plurality of the logical locations, the 
index information including the addresses of 
the logical locations and descriptive informa- 
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tion for information stored at each logical loca- 
tion, the descriptive information matching a 
common profile of interest to a group of users; 
a network interface for receiving index informa- 
tion comprising an address and corresponding 
descriptive information derived by a client ap- 
paratus from the information stored at the ad- 
dress; and 

a database interface for updating the database 
using the received index information. 

26. A server according to claim 25, including a user in- 
terface for receiving an initial request from the client 
to access the database, and downloading means 
for downloading an agent to the client, wherein said 
network interface is adapted to receive the index in- 
formation from the agent at the client. 

27. A server apparatus according to claim 26, wherein 
said user interface is adapted to, after having re- 
ceived the initial request, send a warning to the cli- 
ent that the agent must be downloaded before al- 
lowing access to the database, and to await a con- 
firmatory input from the user, the downloading 
means being adapted to download the agent to the 
client in response to the confirmatory input, and the 
database interface being adapted to permit access 
to the database by the client if the agent is detected 
loaded thereon. 

28. A server apparatus according to any one of claims 
25 to 27, including means for determining the rele- 
vance of the received index information by compar- 
ing the descriptive information in the index informa- 
tion received by the network interface to the profile, 
and the database interface is adapted to update the 
database using the descriptive information and the 
address for any index information determined to be 
relevant. 

29. A server apparatus according to any one of claims 
25 to 28, wherein the stored information at least 
some of the addresses has links to stored informa- 
tion at other addresses, the server apparatus in- 
cluding a search means for accessing and retriev- 
ing information at addresses in the received index 
information, for, when the retrieved information has 
links, accessing and retrieving information stored at 
other addresses, for deriving descriptive informa- 
tion using the retrieved information, for determining 
the relevance of the retrieved information by com- 
paring the descriptive information to the profile, and 
for controlling the database interface to update the 
database using the address and descriptive infor- 
mation of any determined relevant retrieved infor- 
mation. 

30. A server apparatus according to any one of claims 



25 to 29, including database updating means for pe- 
riodically checking the database to identify any in- 
dex information which has not been updated recent- 
ly, for, if any index information is identified, access- 
5 ing and retrieving information stored at any of the 
addresses in the identified index information, for de- 
riving descriptive information using the retrieved in- 
formation, for determining the relevance of the re- 
trieved information by comparing the descriptive in- 
to formation to the profile, and for controlling the da- 
tabase interface to update the database using the 
address and descriptive information of any deter- 
mined relevant retrieved information. 

is 31 . A server apparatus according to any one of claims 
25 to 30, wherein the stored information includes 
text, the profile comprises descriptive text, and the 
descriptive information comprises text. 

20 32. A server apparatus according to claim 31, wherein 
the profile comprises keywords and the descriptive 
information comprises keywords. 

33. A server apparatus according to claim 28, wherein 
25 the profile comprises keywords, the descriptive in- 
formation comprises keywords, and the determin- 
ing means is adapted to determine the relevance of 
the retrieved information by matching the keywords 
of the profile to the descriptive information. 

30 

34. A server apparatus according to claim 33, wherein 
the determining means is adapted to match the key- 
words of the profile to the descriptive information 
using lexical matching of synonyms. 

35 

35. A server apparatus according to claim 28, wherein 
the stored information includes text, the profile com- 
prises descriptive text, the descriptive information 
comprises text, and the determining means is 

<o adapted to determine the relevance of the retrieved 
information by a natural language matching of the 
text of the profile with the text of the descriptive in- 
formation. 

45 36. A server apparatus according to any one of claims 
25 to 35, wherein the network interface is adapted 
to interface to an Internet Protocol network, the 
stored information includes hypertext mark-up lan- 
guage, the addresses comprise Uniform Resource 

50 Locators, and the server includes a web server to 
act as the interface between requests from the cli- 
ent to the database interface. 

37. A method of operating a server providing a search 
55 service to clients over a network to allow searching 
for Information stored in a plurality of addressable 
logical locations over the network, the method com- 
prising: 
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receiving from a client index information com- 
prising an address for stored information re- 
trieved by a client and descriptive information 
derived from the stored information; and 
updating a database of index information for in- 5 
formation stored at a plurality of addressable 
logical locations using the received index infor- 
mation, wherein the index information in the da- 
tabase includes addresses of the logical loca- 
tions and descriptive information for informa- 10 
tion stored at each logical location, the descrip- 
tive information matching a common profile of 
interest to a group of users. 



38. A method according to claim 37 including receiving 
an initial request to access the database from a cli- 
ent, and downloading an agent to the client, wherein 
the index information is received from the agent on 
the client. 

39. A method according to claim 38 including sending 
to the client in response to the initial request a warn- 
ing that the agent must be downloaded before al- 
lowing access to the database, controlling the 
downloading dependent upon a confirmatory re- 
sponse from the client, and permitting access by the 
client to the database if the agent is detected loaded 
on the client. 

40. A method according to any one of claims 37 to 39 
including determining the relevance of the received 
index information by comparing the descriptive In- 
formation therein to the profile, wherein the data- 
base is updated using only index information deter- 



dresses; 

deriving descriptive information using the re- 
trieved information; and 
determining the relevance of the retrieved in- 
formation by comparing the descriptive infor- 
mation to the profile, and updating the database 
using only address and descriptive information 
of any determined relevant retrieved informa- 
tion. 



updated recently; if any index information is identi- 
fied, accessing and retrieving information stored at 
any of the addresses in the identifier index informa- 
tion; deriving descriptive information using the re- 
trieved information; determining the relevance of 
the retrieved information by comparing the descrip- 
tive information to the profile; and updating the da- 
tabase using the address and descriptive informa- 
tion of any determined relevant retrieved informa- 
tion. 

43. A method according to any one of claims 37 to 42, 
wherein the stored information includes text, the 
profile comprises descriptive text, and the descrip- 

15 tive information comprises text. 

44. A method according to claim 43, wherein the profile 
comprises keywords and the descriptive informa- 
tion comprises keywords. 

20 

45. A method according to Claim 40 wherein the profile 
comprises keywords, the descriptive information 
comprises keywords, and the relevance of the re- 
trieved information is determined by matching the 

25 keywords of the profile to the keywords of the de- 
scriptive information. 

46. A method according to claim 45 wherein the key- 
words of the profile are matched to the keywords of 

30 the descriptive information by lexical matching of 
synonyms. 

47. A method according to claim 40 wherein the stored 
information includes text, the profile comprises de- 
scriptive text, the descriptive information comprises 
text, and the relevance of the retrieved information 
is determined by natural language matching of the 
text of the profile with the text of the descriptive in- 
formation. 

48. A method according to any one of claims 37 to 47 
wherein the network is an Internet Protocol network, 
the stored information includes hypertext mark-up 
language, the addresses comprise Uniform Re- 
search Locators, and a web browser receives re- 
ports from the client to the database. 

49. A client apparatus for accessing a server apparatus 
providing a search service to clients over a network 

so to allow searching for information stored in a plural- 
ity of addressable logical locations over the net- 
work, the client apparatus comprising: 

monitoring means for monitoring accessing 
and retrieval of stored information by an appli- 
cation on the client; 

deriving means for deriving descriptive infor- 
mation using the retrieved information; 



42. A method according to any one of claims 37 to 41 
including periodically checking the database to 
identify any index information which has not been 



mined to be relevant. 35 

41. A method according to any one of claims 37 to 40 
wherein the stored information at at least some of 
the addresses has links to stored information at oth- 
er addresses, the method including: 40 

accessing and retrieving information at ad- 
dresses in the received index information; 
when the retrieved information has links, ac- 
cessing and retrieving information at other ad- <5 
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determining means for determining the rele- 
vance of the retrieved information by compar- 
ing the descriptive information to a profile; and 
sending means for sending the relevant derived 
descriptive information and the corresponding 
address to the server apparatus for the updat- 
ing of a database. 

50. A client apparatus according to claim 49 including 
search means for sending in initial request to the 
server apparatus to access the database, and re- 
ceiving means for receiving an agent from the serv- 
er to the client, wherein the agent comprises the 
monitoring means, the deriving means, the deter- 
mining means and the sending means. 

51. A client apparatus according to claim 50 wherein 
said receiving means is adapted to receive a warn- 
ing from the server apparatus that the agent must 
be downloaded before allowing access to the data- 
base, the client apparatus including input means to 
allow a user to input a confirmatory input for trans- 
mission to the server apparatus. 

52. A client apparatus according to any one of claims 
49 to 51 wherein the stored information at least 
some of the addresses has links to stored informa- 
tion at other addresses; the client apparatus includ- 
ing search means for accessing and retrieving in- 
formation stored at other linked addresses, for de- 
riving descriptive information using the retrieved in- 
formation for determining the relevance of the re- 
trieved information by comparing the descriptive in- 
formation to the profile, and for sending the relevant 
derived descriptive information and the correspond- 
ing address to the server apparatus for the updating 
of the database. 

53. A client apparatus according to any one of claims 
49 to 52, wherein the stored information includes 
text, the profile comprises descriptive text, and the 
descriptive information comprises text. 

54. A client apparatus according to claim 53, wherein 
the profile comprises keywords, the descriptive in- 
formation comprises keywords, and the determin- 
ing means is adapted to determine the relevance 
by matching the keywords of the profile to the de- 
scriptive information. 

55. A client apparatus according to claim 54, wherein 
the determining means is adapted to perform a lex- 
ical matching of synonyms. 

56. A client apparatus according to claim 53, wherein 
said determining means is adapted to determine the 
relevance of the retrieved information by a natural 
language matching of the text of the profile with the 



text of the descriptive information. 

57. A client apparatus according to any one of claims 
37 to 56, wherein the network is an Internet Protocol 

5 network, the stored information includes hypertext 
mark-up language, the addresses comprise Uni- 
form Resource Locators, and the monitoring means 
is adapted to monitor the accessing and retrieval of 
hypertext mark-up language by a web browser ap- 

10 plication for the display of web pages. 

58. A method of operating a client for accessing a serv- 
er providing a search service to clients over a net- 
work to allow searching for information stored in a 

is plurality of addressable logical locations over the 
network, the method comprising: 

monitoring accessing and retrieval of stored in- 
formation by an application on the client; 
20 deriving descriptive information using the re- 

trieved information; 

determining the relevance of the retrieved in- 
formation by comparing the descriptive infor- 
mation to a profile; and 
25 sending the relevant derived descriptive infor- 

mation and corresponding address to the serv- 
er for the updating of a database. 

59. A method according to claim 58 including sending 
30 an initial request to the server to access the data- 
base, and receiving an agent from the server, 
wherein the agent performs the monitoring, deriv- 
ing, determining, and sending processes. 

35 60. A method according to claim 59 including receiving 
a warning from the server that the agent must be 
downloaded before allowing access to the data- 
base, and allowing the user to input a confirmatory 
input for transmission to the server apparatus. 

40 

61. A method according to any one of claims 58 to 60 
wherein the stored information at at least some of 
the addresses has links to stored information at oth- 
er addresses, the method including accessing and 

<5 retrieving information stored at other link address- 
es, deriving descriptive information using the re- 
trieved information, determining the relevance of 
the retrieved information by comparing the descrip- 
tive information to the profile, and sending the rele- 
50 vant derived descriptive information and corre- 
sponding address to the server for the updating of 
the database. 

62. A method according to any one of claims 58 to 61 
55 wherein the stored information includes text, the 

profile comprises descriptive text, and the descrip- 
tive information comprises text. 
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63. A method according to claim 62 wherein the profile 
comprises keywords, the descriptive information 
comprises keywords, and the relevance of the 
stored information is determined by matching the 
keywords of the profile to the keywords of the de- 5 
scriptive information for the stored information. 

64. A method according to claim 63 wherein the match- 
ing process comprises a lexical matching of syno- 
nyms. 10 

65. A method according to claim 62 wherein the rele- 
vance of the retrieved information is determined by 
a natural language matching of the text of the profile 
with the text of the descriptive information. 15 

66. A method according to any one of claims 68 to 65 
wherein the network comprises an Internet Protocol 
network, the stored information includes hypertext 
markup language, the addresses comprise uniform 20 
rescource locators, and the monitoring process 
comprises monitoring the accessing and retrieval of 
hypertext markup language by a web browser ap- 
plication for the display of web pages. 

25 

67. Client apparatus for accessing a server apparatus 
providing a search facility to client apparatuses over 
a network to allow searching for information stored 
in a plurality of addressable logical locations over 
the network, the client apparatus comprising: 30 

programme storage for storing programme 
code for controlling a processor; and 
a processor for implementing the programme 
code stored in the programme storage; 35 

wherein the programme code comprises pro- 
gramme code for controlling the processor to: 

monitor the accessing and retrieval of stored in- <o 
formation by an application, 
derive descriptive information using the re- 
trieved information, 

determine the relevance of the retrieved infor- 
mation by comparing the descriptive informa- <5 
tion to a profile, and 

send the relevant derived descriptive informa- 
tion and the corresponding address to the serv- 
er apparatus for the updating of a database. 

50 

68. A computer programme code for controlling a com- 
puter to implement the method of any one of claims 
37 to 48 or 58 to 66. 

69. A carrier medium carrying the computer programme 55 
code according to claim 67. 
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